OGlUCLOfiGH 




DIFFERENTIAL ITEM FUNCTIONING 
FOR MINORITY EXAMINEES ON THE SAT 


Alicia P. Schmitt 
Neil J. Dorans 



Educational Testing Service 
Princeton, New Jersey 
May 1988 



Differential Item Functioning^ ^ 
for Minority Examinees on the SAT ’ 


Alicia P. Schmitt 
Neil J. Dorans 

Educational Testing Service 


An earlier version of this paper was presented at the Annual Meeting of the 
American Psychological Association. New York, August, 1987. 

2 

Funding for this research was provided by the Admissions Testing Program of the 
College Board and Educational Testing Service. The interpretations of results 
represent the opinions of the authors and should not be viewed as statements of 
ETS or the College Board. 



Copyright @ 1988. Educational Testing Service. All rights reserved. 



Abstract 


The standardization approach to assessing differential item functioning 
(DIF), including standardized distractor analysis, is described. The results of 
studies conducted on Asian-Americans, Hispanics (Mexican-Americans and Puerto 
Ricans), and Blacks on the Scholastic Aptitude Test are described and then 
synthesized across studies. Where the groups were limited to include only 
examinees who spoke English as their best language, very few items across forms 
and ethnic groups exhibited large DIF. Major findings include evidence of 
differential speededness for Blacks and Hispanics, and when the item content is 
of special interest, advantages for the relevant ethnic group. In addition 
homographs tend to disadvantage all three ethnic groups, but the effect of 
vertical relationships are not as consistent. Although these findings are 
important in understanding DIF they do not seem to account for all differences. 
Other variables related to DIF still need to be identified. Furthermore, these 
findings are seen as tentative until corraborated by studies using controlled 
data collection designs. 



Differential Item Functioning 
for Minority Examinees on the SAT 

Alicia P. Schmitt^ - 
Neil J. Dorans 

The Scholastic Aptitude Test (SAT) is designed for college-bound juniors and 
seniors and is intended to provide a common standard for evaluation of students' 
general academic preparedness for college. It is neither intended, nor rightly 
used, as the sole criterion in college admissions. Rather, SAT scores provide 
information to supplement student academic records from different schools, among 
which grading practices and curriculums vary considerably. 

Those who develop and review the SAT are aware of the diversity of the 
test-taking population and attempt to construct tests based on a broad sampling 
of tasks and topics that tends not to favor any subgroup of the population. 

Donlon (1981) described the checks that are performed on the SAT to guard against 
favoritism toward any subgroup. In that article, Donlon summarized procedures 
used in the test development process to ensure that items or test questions are 
appropriate for various subgroups. 

Since 1974 the College Board Admissions Testing Program has conducted a 
number of studies that have used statistical methods to examine the performance 
of different subpopulations on final form SAT items. The purpose of these 
studies has been to monitor differential item functioning in order to (1) ensure 
that the SAT remains appropriate over time for major subgroups of the SAT 
candidate population, and (2) identify possible content, format, or 
administrative factors related to differential item functioning to help test 
developers construct fair tests. 

^The authors gratefully acknowledge the assistance provided by Karen Carroll and 
Karen Damiano. 



2 


Carlton and Marco (1982) and Donlon (1984) reviewed the methods used by 
these studies. Dorans (1982) presented a technical review of the studies 
conducted between 1975 and 1979. Since 1983, with the introduction of the 
standardization methodology by Dorans and Kulick (1983), standardization has been 
used to study final-form SAT items for possible unexpected differential item 
functioning between subgroups that have been matched with respect to the ability 
measured by the test. 

In this paper, we will review what has been found when the standardization 
procedure has been applied to assess the differential item functioning (DIF) of 
minority examinees. We cite older studies that have been summarized in either 
Dorans and Kulick (1986), or Schmitt and Dorans (1987), include the results of 
more recent studies, such as Schmitt (1988) and Schmitt and Bleistein (1987), and 
present some new findings that have not appeared elsewhere. Prior to presenting 
the findings, we briefly describe the standardization approach to assessing 
differential item functioning, drawing on descriptions given in Dorans and Kulick 
(1983) and Dorans (1987). We include a description of the standardization 
approach to distractor analysis. After summarizing earlier findings, we 
synthesize these findings across minority subgroups and in the last section 
discuss future directions. We begin, however, with a brief description of the 
SAT. 


CURRENT SAT CONTENT SPECIFICATIONS 

Each SAT test book is divided into six separately timed 30-minute sections: 
2 SAT-Verbal sections (a total of 85 questions); 2 SAT-Mathematical sections (a 
total of 60 questions); 1 Test of Standard Written English section (50 
questions); 1 variable section which does not count toward students' scores. 



3 


The 85 questions in the two verbal sections of the SAT are comprised of four 
types: antonyms, 25 questions; analogies, 20 questions; sentence completion, 15 

questions, and reading comprehension, 25 questions. Antonym questions are used 
to test breadth and depth of vocabulary. Analogies test a student's ability to 
establish relationships between pairs of words and to recognize similar or 
parallel relationships in other pairs. The antonym and analogies questions 
comprise the vocabulary portion of the verbal section. 

The reading portion of the verbal section is composed of 15 sentence 
completion and 25 reading comprehension items. Sentence completion questions 
test a student's ability to recognize logical relationships among parts of a 
sentence. Sentences are given in which one or two words have been omitted. The 
correct answer is the word or set of words that, when inserted in the blanks, 
best fits the meaning of the sentence as a whole. Reading comprehension 
questions are based on reading selections that have been adapted from published 
materials to make them suitable for testing purposes. The selections vary in 
length (typically between 200 and 450 words) and in content. Reading questions 
test comprehension at several levels. Some questions ask the student to 
recognize a restatement of specific information contained in the passage; others 
ask the student to recognize main ideas and supporting details, to make 
inferences on the basis of the passage, to analyze the arguments used by the 
author, to recognize tone or attitude, or to make generalizations from 
information in the passage. 

Questions in the mathematical sections of the test are designed to measure 
abilities related to college-level work in the liberal arts, sciences, 
engineering, and other fields requiring mathematics. The tasks posed on the test 
are designed to assess how well students understand mathematics, how well they 
can apply what is already known to new situations, and how well they can use what 



4 


they know to solve nonroutine problems. The test content is almost equally 
divided among arithmetic reasoning, algebra, and geometry with a few 
miscellaneous questions that cannot be classified in any of the three areas. For 
example, questions testing logical reasoning or the ability to understand and 
apply a new mathematical definition are classified as miscellaneous. 

The mathematics questions are presented in two formats: regular multiple- 
choice (40 questions) and quantitative comparison (20 questions). The regular 
multiple-choice question is the type now familiar to most test takers. The 
quantitative comparison questions emphasize the concepts of inequalities and 
estimation. 

STANDARDIZATION METHODOLOGY 

An item is exhibiting differential item functioning when the probability of 
correctly answering the item is lower for examinees from one group than for 
examinees of equal ability from another group or groups. This definition may be 
formalized mathematically by letting S represent developed ability as measured by 
total score on the standard College Board 200-to-800 SAT scale (or on the 
20-to-60 TSWE scale), and X represent an item score (1 if the answer to the 
question is correct and 0 if the answer is incorrect). An item, then, is free of 
differential item functioning (DIF) when it satisfies the following equality: 

P (X = 11S) = P (X = 11S) for all subpopulations g and g , 
g g 

where P (X = l|S) is defined as the probability that candidates from 
g 

subpopulation g who have total test scores equal to S will answer the item 
correctly. For example, if Black and White candidates with the same total test 
scores do not have equal probabilities of successful performance on the item, 
then this difference in probabilities is taken as evidence of differential item 
functioning for Black and White candidates at this score level. A lack of DIF 



5 


implies that there are no differences in conditional item performance across 
subgroups when the requisite condition before comparison is identical total test 
score. Note that with DIF the focus is on differences between candidates of 
equal score level, among whom one would not expect to find any differences. This 
represents an important distinction from observed differences in item performance 
between groups of varying ability, which is known as impact (Dorans, 1987; 

Holland & Thayer, 1986), where some differences are expected. 

Previous methods used to appraise differential item functioning typically 
have been hampered by sensitivities to differences in overall subpopulation 
ability or differences in item quality (discrimination). The standardization 
methodology, however, controls for differences in both subpopulation ability and 
in item quality. Standardization is used here to mean that differences on one 
variable have been controlled for, prior to making comparisons between groups on 
some related variable. 

A general approach to assessing DIF via standardization is described in 
detail in Dorans and Kulick (1983) and Dorans (1987). The essential features of 
the method as applied to the SAT are as follows: Using the standard College 
Board 200-800 SAT scale, one can establish 61 ability levels (200, 210, 220, . .., 
800). The probability that an examinee at a given ability level will correctly 
answer an item can be estimated by the observed proportion correct among those 
with the given scaled score. Studies of DIF focus on differences between two 
groups. One group is designated the base (b) or reference group. This group is 
used to estimate the conditional probability of successful item performance at 
each given score level. Usually the group that provides the most stable 
estimates of the conditional probabilities across the entire scaled score range 
is selected as the base group. Typically, but not always, this is the largest 
group. The remaining groups are referred to as focal (f) groups or study groups. 



6 


The most detailed measure of differential item performance is at the 

individual scaled score level, D = P~ - P, . Plots of these differences, as 

s fs bs ’ 

well as plots of and , are helpful to visualize the quantification of 

unexpected differential item performance (see Figures 1 and 2). Figure 1 depicts 
an SAT-Verbal item that is performing fairly for both groups. Figure 2 portrays 
an SAT-Verbal item that is unexpectedly easy for Blacks. The left-hand panels in 
the figures (la and 2a) present the conditional probabilities of successful item 
performance for both base and focal groups. These curves may also be thought of 
as nonparametric item-test regressions or empirical item characteristic curves. 
The right-hand panels in the figures (lb and 2b) are plots of the differences in 
conditional probabilities observed above. 


Insert Figures 1 and 2 about here 


Standardization's Item Discrepancy Indices 

On an edition of the SAT, there are 85 verbal items, 60 mathematics items, 
and 50 TSWE items that are scored to produce the SAT-Verbal, SAT-Mathematical, 
and TSWE scores reported to students and designated institutions. Hence, there 
are 195 plots of P^ g , P^ and per operational edition of the SAT. There is an 

obvious need for numerical flags that point toward potential problem items, 
indices that target items with plots like those in Figure 2 for special review 
while allowing items with plots like those in Figure 1 to pass quickly through 
the screening process. Standardization has two such flags, the standardized 
p-difference (DSTD) and the root mean weighted squared difference (RMWSD). Only 
the standardized p-difference is described here. Both indices use a weighting 
function supplied by the standardization group. The standardization approach 
derives its name from the standardization group. The function of the 





7 


standardization group is to supply a set of weights, one for each score level, 
that will be used to weight each of the individual D g before accumulating these 
weighted differences across score levels to arrive at a summary item discrepancy 
index. 

The standardized p-difference . The standardized p-difference is defined as 
follows: 


DSTD = 


• P b S " S , K s' 

S = 1 


where [K^/EK^] is the weighting factor at score level s supplied by the standard¬ 
ization group to weight differences in performance between the focal group (P^ g ) 
and the base group (P^ ). Note that in contrast to impact, in which each group 
has its relative frequency serve as a weight at each score level, 


( 2 ) 


IMPACT = P - P, 
s b 

- Yfs p fs' V*-' YbsW Ybs • 

S=1 S=1 S=1 S=1 


DSTD uses a standard or common weight on both P^ and P^ g , namely [K^/EK^]. The 

use of the same weight on both P^ and P^ s is the essence of standardization. 

Within the general standardization framework, the choice of values for is 

up to the investigator. Some plausible options are the following: 

o K = N , the number of people at s in the total group; 
s ts 

o K = N, , the number of people at s in the base group; 
s bs 

o K - N~ , the number of people at s in the focal group; 
s r s 

or, o K = the relative number of people in some standard reference group, for 
example, a 3-year rolling norms group for the SAT. 



8 


In practice, has been used because it gives the greatest weight to 

differences in and at those score levels most attained by the focal group 

under study. In other words, to date, DSTD has been defined as the difference 
between P^, the observed performance of the focal group on the item, and P 
which can be thought of as either the expected performance of the focal group 
predicted from the base group item test regression curve, P, or the imputed 
performance of selected base group members who are matched in ability to the 
focal group. 

Cutoffs for DSTD 

Two cutoffs are used for DSTD. Experience indicates that a value of 
|DSTD|>.05 will flag a relatively large number of items for review, most of which 
will be deemed acceptable upon closer examination. In contrast we have found 
that |DSTD|>.10 flags relatively few items, all of which require careful 
examination that sometimes leads to the conclusion that the item is biased. For 
operational purposes, |DSTD|>.10 is the recommended flag; for research purposes, 

|DSTD|>.05 is used. The items in Figures 1 and 2 with DSTD values of .00 and .20 
respectively, are included as examples. 

Standardized Distractor Analyses 

The general standardization approach described in Dorans and Kulick (1983) 
is readily adapted to distractor analysis. Standardization operates on a large 
item-by-score level-by-option-by-group contingency table that contains counts of 
the number of examinees at each score level from each focal and base group who 
selected each option (A, B, C, D, E), omitted or did not reach the item. 

Using standardization to obtain standardized response rates for distractors, 
omits and not reached is straightforward. One simply goes into the contingency 
table and pulls out the appropriate counts needed for the distractor analysis. 



9 


For example, for an option A analysis, one can compute a difference in 
standardized response rates via a three-step process. First, the proportions 
choosing option A in the focal and base groups at each score level are computed, 

<3> P fs <A) - A fs /N fs ; P bs (A > - \s'\s 

where A^ g and A^ s are the number of people in the focal and base groups, 
respectively, at score level s who choose option A. The next step is to compute 
the conditional difference in choosing A between the focal and base groups, 

(4) D s (A) - P fs (A) ■ P bs (A). 

Next these differences are summarized across score levels via 

S S 

(5) DSTD(A) = 2 K s [P fg (A) - P bg (A)]/ S K g . 

s=l s=l 

Note that, when A is the correct response, = A^ g and = A^ g , equation (5) 

becomes equation (1). Also note that equations (3), (4) and (5) have parallels 

for options B, C, D, and E and Omits and Not Reached. For example, the 
standardization approach for Not Reached culminates in a standardized not-reached 
difference, 


S S 

(6) DSTD(NR) = 2 K [P (NR) - P (NR)]/ 2 K . 

I s r s os .. s 

s=l 

This index can be used to assess the degree of differential speededness, i.e., 
the degree to which the focal group responds to items at a different rate than 
matched members from the reference group. 



10 


Standardization and the Mantel-Haenszel Method 

Educational Testing Service has adopted the Mantel-Haenszel (Mantel 6c 
Haenszel, 1959) method for flagging items that exhibit DIF. The DIF application 
of this procedure is described in detail in Holland and Thayer (1988) . Dorans 
(1987) describes both the Mantel-Haenszel approach and the standardization 
approach as contingency table approaches that are highly related to each other. 
Wright (1987) demonstrates that the two procedures produce virtually identical 
orderings of SAT items with respect to amount of DIF when the procedures measure 
DIF in the same metric. Hence, it is quite likely that similar results would 
have been obtained with the Mantel-Haenszel approach if it had been available at 
the time when most of the studies reported herein were conducted. 

DIFFERENTIAL ITEM FUNCTIONING BY ETHNIC GROUP 

The results of differential item functioning studies for three ethnic groups 
are summarized in this section. The three ethnic groups are Asian-Americans, 
Hispanics, and Blacks. 

Asian-Americans 

Two studies have been conducted on Asian-Americans (Kulick 6c Dorans, 1983; 
Bleistein 6c Wright, 1987). In Kulick and Dorans (1983), 2,616 Asian-American 
examinees from the November 1980 administration of the SAT served as the focal 
group, while 65,942 White examinees served as the base group. In Bleistein and 
Wright (1987), the base group was composed of 278,099 White examinees who took 
the November 1983 form of the SAT, and there were two focal groups, 
Asian-Americans who did not speak English as their best language (N= 3,314) and 
Asian-Americans who spoke English as their best language (N=9,890). Bleistein 


i 



11 


and Wright (1987) synthesized these two studies and focused on the major 
substantive findings. 

SAT-Verbal . In the Kulick and Dorans (1983) study, 14 of 85 verbal items 
had absolute values of DSTD that exceeded .05, which represents a large number of 
items to exhibit DIF. (As reported by Dorans and Kulick (1986), earlier DIF 
studies had flagged very few items for DIF.) The authors hypothesized that the 
relatively large number of items with DIF for Asian-Americans could be attributed 
in part to the sizable percentage of Asian-Americans for whom English was not 
their best language. The Bleistein and Wright (1987) study confirmed the 
language hypothesis proposed by Kulick and Dorans (1983) . For the English-best- 
language Asian-American group, only six items had DSTD values that exceeded .05 
in absolute value. In contrast, the non-English-best-language Asian-American 
group had 31 items with DSTD values greater than .05 in absolute value. These 
results dramatized the need for including only examinees who speak English as 
their best or first language in DIF studies. 

SAT-Mathematical . The results obtained for SAT-Mathematical also reflect 
the influence of language skills. In Kulick and Dorans (1983), 16 of 50 items 
had sufficient DIF to be flagged for close review. Of those items flagged for 
review, those with negative DIF tended to be mathematics items with a high 
"verbal-load", while positive DIF items tended to be "pure math" questions. In 
the Bleistein and Wright (1985) study, 25 of 60 items were flagged in the 
non-English-best-language Asian-American group. In contrast, only six items were 
flagged in the English-best-language focal group of Asian-Americans. As with 
SAT-Verbal, these results indicate that a basic amount of English proficiency is 
required before SAT items measure unitary dimensions, which is a technical 
prerequisite for DIF procedures that are based on detecting departures from 
unidimensionality for subgroups of the population. 



12 


Hispanics 

Schmitt (1988) used the standardization procedure to study DIF on the SAT 
for Hispanics, in particular Mexican-Americans and Puerto-Ricans who reside in 
the continental United States and who speak English as their best language. Two 
forms of the SAT were studied, the forms administered in November 1983 and 
November 1984. The November 1983 analyses involved 278,166 Whites as the base 
group, a focal group of 2,963 Mexican-Americans, and a focal group of 3,230 
Puerto Rican candidates. The November 1984 analyses were based on 285,885 
Whites, 3,456 Mexican-Americans, and 3,384 Puerto Rican candidates. 

Since analyses for the SAT-Mathematical test did not demonstrate much DIF 
for either Hispanic group, Schmitt focused her analyses on the SAT-Verbal test 
(Schmitt, 1985; 1988). In the November 1983 analyses 8 and 12 items out of 85 
SAT-Verbal items were flagged for DSTD values greater than .05 in absolute value 
for Mexican-Americans and Puerto Ricans, respectively. In the November 1984 
analyses, the number of flagged items for Mexican-Americans and Puerto Ricans 
were 14 and 16, respectively. It should be noted that most of the flagged items 
across both Hispanic groups and across both studies had DSTD values whose 
absolute magnitudes were smaller than .10. Of the four item types on the SAT, 
analogy items exhibited the most negative DIF for both Hispanic groups. 

Schmitt performed an in-depth analysis of the flagged items from the 
November 1983 analysis to identify characteristics of items that might explain 
DIF, and generated some testable hypotheses. To assess the hypotheses, each 
verbal item across the two November SAT forms was classified by at least two 
bilingual judges. True cognates (words with a common English and Spanish root), 
false cognates (words where the meaning is not the same in both languages), and 
homographs (words that are spelled alike but have different meanings) were 



13 


identified at the stem, key and distractor level for each item. Items that might 
have special interest to either Hispanic group were also identified. In addition 
to thorough item content analyses, correlations between these item 
characteristics (cognates, homographs, and interest) and the DSTD values for all 
items were computed. Due to infrequent occurrence of homographs and false 
cognates correlations for these variables were not computed. 

On the basis of these analyses, Schmitt found evidence to support two 
hypotheses. First, items with true cognates will tend to have positive DIF that 
favors Hispanic examinees. Second, items with content of special interest to 
Hispanics will tend to have positive DIF. In addition, item content analyses 
showed that items with negative DIF tended to have either homographs and/or false 
cognates. 

Blacks 

There have been several studies using the standardization approach in which 
the focal group for DIF analyses has been Blacks (Kulick, 1984; Rogers, Dorans & 
Schmitt, 1986; Rogers & Kulick, 1987; Schmitt & Bleistein, 1987). Rogers and 
Kulick (1987) summarized the results from earlier studies across three forms of 
the SAT that were administered in November 1980, November 1983, and November 
1984. Rogers and Kulick reported 14, 11, and 11 (out of 85) SAT-Verbal items 
with absolute values of DSTD greater than .05 in their analyses of the November 
1980, November 1983, and November 1984 forms, respectively. Of the four item 
types, the analogy items exhibited the most incidences of DIF and the most 
incidences of negative DIF, while the reading passage items tended to exhibit the 
most positive DIF. For SAT-Mathematics, Rogers and Kulick reported 7,7 and 4 
(out of 60) items for the November 1980, November 1983, and November 1984 forms, 



14 


respectively. Of those flagged, the preponderance were regular mathematics 
items, and explanations for DIF were difficult to generate for the flagged items. 

The existence of negative DIF for Blacks on SAT-Verbal analogy items was 
consistent with research found on other tests (Scheuneman, 1981) and with earlier 
research conducted on the SAT (Dorans, 1982). Echternacht (1972) found this with 
the Admission Test for Graduate Study in Business, Scheuneman (1978) with a 
pretest version of the Otis-Lennon School Ability Test, and Strieker (1982) with 
the Graduate Records Examination. Schmitt and Bleistein (1987) conducted an 
investigation of DIF for Blacks on SAT-Verbal analogy items in an effort to 
identify factors that might contribute to the DIF. Possible factors were drawn 
from the literature on analogical reasoning and previous DIF research on Blacks. 

The research was performed in two steps. Hypotheses about analogical DIF 
were developed after close examination of the three 85 SAT-Verbal test forms 
studied by Rogers and Kulick (1987). Following these analyses, two additional 
test forms were studied to validate the hypothesized factors. Standardization 
analyses were conducted on the key, all distractors, not reached and omits. Two 
types of treatment of not reached examinees were employed in the standardization 
analyses. In the standard analysis, candidates who did not reach the item were 
included in the calculation of standardized response rates, labelled in this 
study as DSTD^. In the supplementary analysis, candidates who did not reach the 
item were excluded from the calculation of the standardized response rates, 
leading to DSTD^. 

The major finding of the Schmitt and Bleistein (1987) study was that Black 
students do not complete SAT-Verbal sections at the same rate as White students 
with comparable SAT-Verbal scores. This differential speededness effect accounts 
for much of the differential item functioning for Blacks on SAT analogy items. 
When the not reached examinees were excluded from the calculation of the 


i 



15 


standardized response rate differences, only a few analogy items exhibited enough 
DIF to warrant investigation. 

Evaluation of the hypothesized causes of DIF was complicated by the fact 
that several factors were confounded, including item position within each analogy 
set, item difficulty, subject matter content, level of abstractness, and semantic 
relationship type. Two factors did seem to hold up, "vertical relationships" and 
homographs. When an item has a word or words in the stem that can be associated 
with a word in the key or any distractor which is independent of the analogical 
relationship between the two words in the stem, vertical relationships or word 
associations are present. These relationships are seen as strategies used by 
students when they are trying to guess the correct answer to an item. A 
homograph is a word that is spelled like another word, but which differs in 
meaning or pronunciation. For purpose of the study, only items that test a less 
common meaning of words that are homographs were identified as homographs. In 
general, a vertical or word associative answering strategy seems to be more 
consistently used by Black examinees on items with negative DIF. Results also 
showed that as the number of homographs increased there was a tendency to have 
more negative DIF. The authors concluded that since the results were based on 
regular SAT administration data where many factors are confounded, the 
generalizations were limited, but proposed follow-up study through a rigorously 
designed investigation. 


FINDINGS ACROSS ETHNIC GROUPS 

Findings from the Asian-American studies (Kulick & Dorans, 1983; Bleistein & 
Wright, 1985, 1987) stressed the need to include only examinees who speak English 
as their best or first language in DIF studies. Accordingly, subsequent DIF 
studies with other minority students--Hispanics (Schmitt, 1985; 1988) and Blacks 



16 


(Rogers, Dorans 6c Schmitt, 1986; Roger 6c Kulick, 1987; Schmitt 6c Bleistein, 1987) 
based their analyses on students who identified themselves as having English as 
their best language. Comparison across Asian-American, Hispanic, and Black DIF 
results are also restricted to self-identified English- best-language students. 

The main findings from the previously summarized studies were evaluated as 
to their generalizability across other ethnic groups. These findings include the 
differential speededness effect found for Black examinees on analogy items 
(Schmitt 6c Bleistein, 1987), the relationship between items with Hispanic content 
of special interest and positive DIF (Schmitt, in press) and the frequent 
occurrence of homographs and "vertical relationships" in analogy items with 
negative DIF for Black examinees (Schmitt 6c Bleistein, 1987). Standardized 
distractor analyses were computed for the Asian-American (AA), Mexican-American 
(MA), Puerto Rican (PR), and Black-White (BLK) comparisons for the SAT-Verbal 
forms taken in November 1983 and for these last three groups for the November 
1984 SAT-Verbal form. 

Differential Speededness 

Distractor analyses demonstrated that analogy items in the SAT-Verbal 1 
section (ten analogies located at the end of the 45-item Verbal section) were 
reached by a slightly larger proportion of White examinees than by Black, 

Mexican-American, Puerto Rican and, to a lesser extent Asian-American examinees 
of comparable ability. No indication of differential speededness was found for 
the ten analogy items which are located in the middle, of the 40-item SAT-Verbal 2 
section. These results extend the differential speededness findings for Black 
examinees described by Schmitt and Bleistein (1987) to Mexican-Americans and 
Puerto Ricans. Figures 3-6 present the 1983 DIF comparison between DSTD^ and 
DSTD 2 by SAT-Verbal section for each ethnic group. In addition Figure 7 compares 



17 


the differential not-reached indices obtained by ethnic groups. Figure 7 should 
be interpreted cautiously since each of the separate ethnic subgroup analyses are 
based on different standardization weights for each focal group and thus the 
values of DIF are not directly comparable. Nevertheless, Figure 7 summarizes the 
same conclusions that can be reached from Figures 3-6. Matched Asian-American/ 
White comparisons show a slightly higher proportion of Asian-American students 
answering the earlier analogy items from the SAT-Verbal Section 1. The reverse 
is true for the last analogy items of the SAT-Verbal Section 1. These items are 
reached by a higher proportion of matched White examinees. For the other three 
Ethnic/White comparisons all ten analogy items are reached by a higher proportion 
of the corresponding matched White group. The Black focal group shows the 
largest proportion of examinees differentially not reaching these analogy items. 


Insert Figures 3-7 about here 


Examination of the not reached standardized differences for the last ten 
items of the SAT-Verbal 2 section show that the Hispanic and Black groups also 
have a higher not reached proportion than the matched White comparison group. 

The last ten items of the SAT-Verbal 2 section are reading comprehension items. 

Based on these results, in all subsequent analyses and comparisons between 
ethnic DIF results the not reached candidates were excluded from the calculation 
of the standardized response rates in order to remove the influence of 
speededness from the DIF statistic. This corrected standardization index is 
labelled DSTD^ As with DSTD^, positive DSTD^ values indicate that the item is 
differentially easier for that ethnic group, while negative DSTD^ values indicate 
that the item is differentially harder. 





18 


Special Interest 

Items classified as having special interest were evaluated as to their 
relationship with standardized difference values. According to Schmitt (1988), 
Hispanics students tend to find items with content which references a Hispanic or 
other minority group or which is related to topics relevant to the Spanish 
culture to be differentially easier. The same classification used for the 
Hispanic study was initially used in the analyses for all ethnic groups. In 
addition, reclassification of content of interest for each individual group was 
done by coding a "2" if the item is of special interest to each particular group, 
a "l" if it is of possible interest and a "0" if it does not have content of 
interest. 

Table 1 presents the correlations between the sentence completion and the 
reading comprehension items (the only item types with a large number of interest 
items) and the two interest codes (the general Hispanic and the subsequent 
specific ethnic classifications). The codes are represented as H and E 
respectively. 


Insert Table 1 about here 


The interest classifications for the Mexican-American comparisons remained 
the same. For the Asian-American, Puerto Rican, and Black groups the items on 
the 1983 reading comprehension passage about Mexican-Americans were reclassified 
as of possible interest (1) instead as of special interest (2). In addition, 
interest classifications for the Blacks and Asian-American groups on the 1983 
form were recoded as of no interest (from 1 to 0) for those items referring to 
the passage about the Catholic Church and Rome. Interest classifications for 





19 


both Hispanic groups remained the same for the 1984 form. For the Black group 
interest classifications for items about a passage on the accomplishments of a 
Black mathematician were reclassified from possible interest (1) to special 
interest (2). 

Results from the correlation analyses show that for Asian-Americans there is 
not much of a relationship between content of interest and DIF but this is 
probably because no items of special interest were present on any of the forms 
for Asian-Americans. For all other three groups, Mexican-Americans, Puerto 
Ricans, and Blacks, there is a positive relationship between content of interest 
and DIF. Reclassification of interest codes for each specific ethnic group 
tended to generally increase this relationship or leave it unchanged. The only 
exception is for the Black correlation on the 1983 form which was lower for the E 
codes. 

The standardized difference values across ethnic groups for the two sets of 
reading comprehension items with special interest are presented in Table 2. Each 
of the 5-item sets corresponds to a passage that references one of the ethnic 
groups studied. 


Insert Table 2 about here 


The Reading Comprehension item set from the November 1983 form focuses on 
changes in the life-style of migrant Mexican-American families. Four out of five 
items of this passage have positive DSTD 2 values for the Mexican-American group, 
indicating that all items are generally easier for this group when compared to a 
matched White group. The item set from the November 1984 form corresponds to a 
reading passage which references the work of a Black mathematician. Four of the 
five items on this passage have positive DSTD 2 values for all the three ethnic 





20 


groups available (Mexican-American, Puerto Rican and Black). The only negative 
DIF item of this passage-item-set is most extreme only for Blacks and only for a 
DSTD^ value of -.03. 

Homographs 

Results from the Schmitt (1988) and the Schmitt and Bleistein (1987) studies 
showed that homographs are related to differential item functioning of Hispanic 
examinees and Black examinees. Items testing less common meanings of homographs 
tend to be differentially harder for these examinees. Classifications of 
homographs for the 1983 and 1984 forms (Schmitt, 1988; Schmitt & Bleistein, 1987) 
identified each pair of words in the analogy item (stem, key and each distractor) 
as either having or not having a homograph. Values for homograph classifications 
were 0 (no homograph) or 1 (homograph). 

Correlations between DSTD^ values and homograph codes in the stem, or key 
and distractors were calculated for analogy items (the only item type that had 
sufficient homographs). Results for each ethnic group are presented in Table 3. 
The stem and key variables took on 0 or 1 as their values. The values for the 
distractor variable were obtained by summing up 0, 1 codes over distractors on 
the item. The value for the All-S, K, D variable was a sum of the stem, key, and 
distractor variables. The negative correlations indicate that items with 
homographs tend to be differentially harder for all ethnic groups. The results 
for the 1983 form are more compelling than the results for the 1984 form. 


Insert Table 3 about here 

Vertical Relationships 

When an item has a word or words in the stem that can be associated with a 
word in the key or any distractor which is independent of the analogical 





21 


relationship between the two words in the stem, vertical relationships or word 
associations are present. These relationships are seen as strategies used by 
students when they are trying to guess the correct answer to an item. They were 
observed as strategies used by Black students or White students when a word in 
the stem is especially hard or esoteric to a particular group or when there are 
other sources of confusion such as homographs present (Schmitt & Bleistein, 

1987) . Table 3 also presents the correlations between the DSTD^ values and 
classifications of vertical relationships. It was hypothesized that vertical 
relationships in distractors are negatively related to DSTD^. Vertical 
relationships in the key could nevertheless make the key more attractive and thus 
could be positively related to DSTD^. A consistent negative correlation between 
DSTD^ and vertical relationship in the distractors is found across all groups for 
the 1983 form; while a consistent positive correlation between DSTD^ and vertical 
relationship in the key is found for the 1984 form. Further evaluation of other 
factors and vertical relationships needs to be explored. 

Analyses for extreme items 

Distractor analyses are particularly helpful in studying items with more 
extreme standardized differences in the key. A value of |DSTD|>.10 is considered 
indicative of items with extreme standardized differences (Dorans 6c Kulick, 

1986). 

Table 4 presents items with high DSTD^ values for at least one of the ethnic 
groups analyzed. Two antonym items have high positive DSTD^ values. In 
addition, two analogy and two antonyms with high negative DSTD^ values are 
presented. It is noteworthy that out of 170 SAT-Verbal items only six had values 
of DSTD^ that exceeded .10 in absolute value. 



22 


Insert Table 4 about here 


The two extreme positive antonym items have high DSTD^ values for Mexican- 
American and Puerto Rican-White comparisons. No content of special interest is 
present but both items are composed of words that are true cognates. Distractor 
analyses show that neither Hispanic group omits these items as much as the 
matched White group does. On the first antonym item there is also a slight 
tendency (particularly for the Puerto Rican group) to select distractors B and C 
slightly less than the matched White group. Figure 8 presents distractor plots 
for each of the alternatives chosen by the Puerto Rican group on this item. 


Insert Figure 8 about here 


The first of the two extreme negative analogy items was classified as having 
a "vertical relationship" on distractor C - "seed:flower". This distractor 
differentially draws more responses of most of the ethnic groups. Puerto Rican 
and Black students selected this distractor differentially more than the matched 
White comparison group. For both of these groups the item is differentially 
harder. Figure 9 presents the Black distractor plots for this analogy item (the 
Puerto Rican plots are very similar). 


Insert Figure 9 about here 


For the second analogy item none of the distractors seems to be 
differentially drawing any of these ethnic groups. In addition, even though the 









23 


word "CUMULUS" in the stem is difficult and could preclude students from making 
the needed analogical relationship with "CLOUD" in order to select the correct 
answer, "evergreen:tree", only the Asian-American ethnic group omitted the item 
at a rate differentially higher than the matched group of Whites. 

Distractor analyses for the two negative antonym items do not show any 
differential omit rates for any of the three ethnic groups studied 
(Asian-American analyses were not available for the 1984 form). There is a 
slight differential selection of distractor E, "delude", by both Hispanic groups 
but no logical rationale for its selection can be deduced. The source of 
differential functioning could be a differential vocabulary difficulty of the 
stem "ENUNCIATE" and the key "slur". The second antonym has a distractor that 
clearly draws differentially more students from the three ethnic groups 
available. The distractor A "difficult to learn" could be differentially 
selected more because students in these groups might be confusing the stem 
"PRACTICAL" with practice. No homographs or content of interest is present for 
either of these two items. The Black distractor plots are presented in Figure 10 
for this antonym item, and vividly demonstrate how Blacks with scaled scores 
below 500 are drawn toward the distractor "difficult to learn". 


Insert Figure 10 about here 

CONCLUSIONS 

Since the results presented are based on item characteristics found on the 
November 1983 and 1984 SAT-Verbal forms, the frequency of content of interest, 
homographs, and vertical relationships are limited and restrict our conclusions 
The present findings are, nevertheless, very suggestive in that there are item 
characteristics that are related to DIF and do generalize across ethnic groups. 





24 


These characteristics are content of interest, homographs, and to some extent, 
vertical relationships. 

Items with content of interest tend to be differentially easier for Hispanic 
and Black examinees. Content of interest was mostly found on sentence completion 
and reading comprehension item types. 

Words that have more than one meaning in English or homographs were 
negatively related to DIF. All ethnic groups studied tend to find items with 
homographs differentially harder. As the number of homographs in an item 
increases the item also becomes differentially harder. Even though the frequency 
of homographs is greater in the analogy item type, it is also frequently found in 
antonym items. 

The relationship between vertical associations and DIF is not as clear as 
for the other factors; it seems to be form dependent and might be related to 
other item characteristics in its effect on DIF. It might be more a response 
strategy that is used only when examinees have difficulty in answering analogies. 
As mentioned by Schmitt and Bleistein (1987) this strategy is not used only by 
minority examinees. The item presented in Figure 2 is an analogy item that was 
differentially easier for Black examinees. The stem of this item is 
DASHIKI:GARMENT. Dashiki is an African word referring to a type of garment. 
Distractor analyses showed that the distractor "hat:coat" was selected by fewer 
Blacks than Whites of comparable ability. This distractor has a vertical 
association with garment and thus became a popular choice for students for whom 
the word dashiki was esoteric. Closer examination of vertical associations and 
other factors that might be related to their use needs to be pursued. 

Distractor analyses for extreme items supported hypothesized factors 
(homographs, vertical relationships) for some cases, but in other items with 
negative DIF no apparent reason for the DIF was evident. These results 



25 


demonstrate that even though some causes of DIF have been identified other 
variables still need to be identified. 

Perhaps the most important finding was the detection of differential 
speededness across ethnic groups for items appearing at the end of each 
SAT-Verbal section. Blacks and Hispanics tend to have fewer examinees responding 
to items at the end of a section than do Whites of comparable SAT-Verbal score. 
This differential speededness contributes to the appearance of differential item 
functioning for items at the end of test sections. But, this DIF may be more a 
function of item location than of item characteristics which interact with group 
membership. 


FUTURE DIRECTIONS 

This paper summarizes the differential item functioning research on the SAT 
that has been performed on minority examinee subgroups using the standardization 
method. The College Board and the Educational Testing Service is about to embark 
on a large scale implementation of operational differential item functioning work 
on the SAT. Both standardization indices and the Mantel-Haenszel statistic will 
be routinely computed on pretest items for each of five focal groups, Asian- 
Americans, Blacks, Hispanics, American Indians, and females. These data are 
bound to contain confirmations and refutations of some of the findings described 
herein. With such a plethora of data, meta-analytic schemes for analyzing the 
data become of paramount importance. 

As DIF implementation moves swiftly along at ETS, it becomes clear that 
several fundamental issues require more attention. Foremost among these is the 
issue of focal group definition. To date, focal groups have been intact 
easily-defined groups such as Blacks or Hispanics. It could be argued, however, 
that these intact ethnic groups are merely surrogates for an educational 



26 


disadvantage attribute that should be used in focal group definition. This 
argument echoes that made a decade ago in the American Psychologist by Novick and 
Ellis (1977) , where a strong case was made for "the explicit identification of 
those attributes that constitute disadvantage, rather than accepting group 
membership as a surrogate for disadvantage" (p. 318). Novick and Ellis 
acknowledged that the problems of understanding what constitutes disadvantage and 
being able to measure it adequately were formidable. They still are. 

Significant advances in DIF research may depend on serious efforts to solve such 
problems. 


/kad 

ST\RRDIFMIN 



27 


REFERENCES 

Bleistein, C. A., & Wright, D. (1987). Assessment of unexpected differential 
item difficulty for Asian-American candidates on the Scholastic Aptitude 
Test. In A. P. Schmitt & N. J. Dorans (Eds.), Differential item functioning 
on the Scholastic Aptitude Test (RM-87-1). Princeton, NJ: Educational 
Testing Service. 

Carlton, S. T., & Marco, G. L. (1982). Methods used by test publishers to 

"debias" standardized tests: Educational Testing Service. In R. A. Berk 
(Ed.), Handbook of Methods for Detecting Test Bias . Baltimore, MD: Johns 
Hopkins Press. 

Donlon, T. F. (1981). The SAT in a diverse society: Fairness and sensitivity. 
The College Board Review . No. 122, 16-21, 30-32. 

Donlon, T. F. (Ed.) (1984). The College Board Technical Handbook for the 

Scholastic Aptitude Test and Achievement Tests . New York: College Board 
Publications. 

Dorans, N. J. (1982). Technical review of item fairness studies: 1975-1979 . 

(ETS Statistical Report SR-82-90). Princeton, NJ: Educational Testing 
Service. 

Dorans, N. J. (1987). Two new approaches to assessing unexpected differential 

item performance: Standardization and the Mantel-Haenszel method. In A. P. 
Schmitt & N. J. Dorans (Eds.), Differential items functionine on the 
Scholastic Aptitude Test (RM-87-1). Princeton, NJ: Educational Testing 
Service. 

Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item 
performance of female candidates on SAT and TSWE forms administered in 
December 1977: An application of the standardization approach . (RR-83-9). 
Princeton, NJ: Educational Testing Service. 

Dorans, N. J., 6c Kulick, E. (1986). Demonstrating the utility of the 

standardization approach to assessing unexpected differential item 
performance on the Scholastic Aptitude Test. Journal of Educational 
Measurement , 2_3, 355-368. 

Echternacht, G. (1972). An examination of test bias and response 

characteristics for six candidate groups taking the ATGSB (RR-72-4). 
Princeton, NJ: Educational Testing Service. 

Holland, P. W., & Thayer, D. (1986, April). Differential item performance and 

the Mantel-Haenszel statistic . Paper presented at the annual meeting of the 
American Educational Research Association, San Francisco. 

Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and 

the Mantel-Haenszel procedure. In H. Wainer 6c H. I. Braun (Eds.) Test 
validity . Hillsdale, NJ: Erlbaum. 

Kulick, E. (1984). Assessing unexpected differential item performance of Black 
candidates on SAT form CSA6 and TSWE form E33 (SR-84-80). Princeton, NJ: 
Educational Testing Service. 



28 


Kulick, E., 6c Dorans, N. J. (1983). Assessing unexpected differential item 
performance of Oriental candidates on SAT form CSA6 and TSWE form E33 
(SR-83-106). Princeton, NJ: Educational Testing Service. 

Mantel, N., 6c Haenszel, W. M. (1959). Statistical aspects of the analysis of 

data from retrospective studies of disease. Journal of the National Cancer 
Institute, 22, 719-748. 

Novick, M. R., 6c Ellis, D. D. (1977). Equal opportunity in educational and 
employment selection. American Psychologist . 22., 306-320. 

Rogers, J., Dorans, N. J., 6c Schmitt, A. P. (1986). Assessing unexpected 

differential item performance of black candidates on SAT form 3GSA08 and 
TSWE form E43 . (SR-86-22). Princeton, NJ: Educational Testing Service. 

Rogers, H. J., 6c Kulick, E. (1987). An investigation of unexpected differences 
in item performance between Blacks and Whites taking the SAT. In A. P. 
Schmitt & N. J. Dorans (Eds.), Differential item functioning on the 
Scholastic Aptitude Test (RM-87-1). Princeton, NJ: Educational Testing 
Service. 

Scheuneman, J. D. (1978). Ethnic group bias in intelligence test items. In 

S. W. Lundsteen (Ed.), Cultural factors in learning and instruction . New 
York: ERIC Clearinghouse on Urban Education, Diversity Series, No. 56. 

Scheuneman, J. D. (1981). A response to Baker's criticism. Journal of 
Educational Measurement . 16., 143-152. 

Schmitt, A. P. (1985). Assessing unexpected differential item performance of 
Hispanic candidates on SAT form 3FSA08 and TSWE form E47 . (SR-85-169). 

Princeton, NJ: Educational Testing Service. 

Schmitt, A. P. (1988). Language and cultural characteristics that explain 
differential item functioning for Hispanic examinees on the Scholastic 
Aptitude Test. Journal of Educational Measurement . 25., 1-13. 

Schmitt, A. P., & Bleistein, C. A. (1987). Factors Affecting Differential Item 
Functioning for Black Examinees on Scholastic Aptitude Test Analogy Items 
(RR-87-23). Princeton, NJ: Educational Testing Service. 

Schmitt, A. P., & Dorans, N. J. (1987). Differential Item Functioning on the 
Scholastic Aptitude Test (RM-87-1). Princeton, NJ: Educational Testing 
Service. 

Strieker, L. J. (1982). Identifying test items that perform differently in 

population subgroups: A partial correlation index. Applied Psychological 
Measurement , 6, 261-273. 

Wright, D. (1987). An empirical comparison of the Mantel-Haenszel and 

standardization methods of detecting differential item performance. In 
A. P. Schmitt 6c N. J. Dorans (Eds.), Differential Item Functioning on the 
Scholastic Aptitude Test (RM-87-1), Princeton, NJ: Educational Testing 
Service. 



29 


Table 1 

Correlations between Standardized 
Difference (DSTD 2 ) and Interest 


_ 1983 _ 1984 2 

Item Type Code^ AA MA PR BLK MA PR BLK 

Sent. Com.^ H -- -- -- -- -65 .62 .11 

E -- - - - - -- .65 .62 .11 

Read. Com. H .11 .56 .20 .53 .22 .24 .34 

E -.04 .56 .34 .37 .22 .24 .40 


1 Code H refers to interest as coded in the Hispanic study (Schmitt, in press) 
Code E refers to the reclassification of interest by specific ethnic group. 

O 

Asian-American DIF analyses are not available for the 1984 form. 

o 

There was only one item classified as of interest for the Sentence Completion 
items of the 1983 form. 





30 


Table 2 

Standardized Differences for Reading Comprehension 
Items for Two Reading Passages 


Form 

AA 1 

DSTD 2 

MA 

PR 

BLK 

Read. 

Com. 

Item 

Passage 

Content 

1983 

.03 

.06 

.01 

- .02 

78 



- .01 

.04 

.01 

.01 

79 

Migrant 


.01 

.01 

CM 

o 

- .02 

80 

Mexican- 


.01 

.05 

.01 

- .04 

81 

American 


- .02 

.00 

.00 

- .00 

82 

Workers 


1984 .05 

.05 

.06 

21 


.06 

.02 

.06 

22 

Black 

.04 

.04 

.05 

23 

Mathematician 

- .00 

- .01 

.03 

24 


.02 

.05 

.08 

25 


^"Asian-American DIF 

analyses 

are 

not available for the 1984 form 



31 


Table 3 

Correlations Between Standardized Difference (DSTD^) 
and Predicted Variables for Analogies 


Variables 


1983 

Form 


1984 1 


AA 

MA PR 

BLK 

MA 

PR 

BLK 


Homograph 

Stem 

.05 

27 

- .28 

- .31 

- .05 

- .02 

- .13 

Key 

-.31 

.26 

- .34 

- .41 




Distractors 

-.46 

.42 

- .37 

- .31 

- .26 

-.30 

- .32 

All-S,K,D 

-.41 

.49 

- .49 

- .49 

- .26 

- .28 

- .35 

Vertical 

Key 

- .05 

.28 

- .18 

- .27 

.14 

.24 

.16 

Distractors 

- .22 

.30 

- .46 

- .50 

- .08 

.06 

- .13 

^"Asian-American 

DIF analyses 

are 

not available for 

the 1984 

form. 





32 


Table 4 

Items with High Standardized Difference (DSTD^) 


Item 

Type 

Form 


DSTD 2 



Item 

Characteristic 

AA 1 

MA 

PR 

BLK 

Int. 

Homog. 

Vert. R. 

Antonym 

1983 





FACILITATE: 

0 

0 

0 



- .01 

- .01 

01 

- .01 

(A) intensify 






- .02 

- .03 

04 

- .01 

(B) mobilize 






- .01 

- .03 

06 

- .00 

(C) decline 






.04 

.13 

19 

.03 

(D) complicate 






.00 

- .00 

01 

.00 

(E) meditate 






-.00 

- .06 

06 

- .01 

Omitted 




Antonym 

1984 





AGGRANDIZEMENT: 

0 

0 

0 




- .01 

.01 

.00 

(A) assessment 







.01 

.00 

.01 

(B) leniency 







.01 

.01 

.02 

(C) restitution 







.00 

.02 

- .00 

(D) annulment 







.06 

.11 

.02 

(E) diminution 







- .07 

.08 

- .05 

Omitted 




Analogy 

1983 





BARK:TREE:: 

0 

Stem 

Key 



- .05 

- .08 

.14 

- .10 

(A) skin:fruit 


E 

C 



- .00 

.00 

.01 

.01 

(B) dew:grass 






.02 

.04 

.08 

.06 

(C) seed:flower 






.02 

.02 

.02 

.03 

(D) peak:hill 






.01 

.02 

.02 

.01 

(E) wake:boat 






.00 

.00 

.01 

.00 

Omitted 




Analogy 

1983 





CUMULUS:CLOUD:: 

0 

0 

0 



.01 

.03 

.01 

.01 

(A) lake:ocean 






.00 

.03 

.01 

.01 

(B) carnivore:meat 






.01 

.02 

.03 

.02 

(C) glacier:blizzard 





- .06 

- .11 

.10 

- .08 

(D) evergreen:tree 






.01 

.01 

.02 

.02 

(E) evening:daylight 





.03 

.02 

.01 

.02 

Omitted 




Antonym 

1984 





ENUNCIATE: 

0 

0 

0 




.01 

.02 

.02 

(A) detach 







- .08 

.11 

- .10 

(B) slur 







.01 

.02 

.02 

(C) disfigure 







.01 

.03 

.02 

(D) cloister 







.04 

.04 

.02 

(E) delude 







.01 

.01 

.01 

Omitted 




Antonym 

1984 





PRACTICAL: 

0 

0 

0 




.04 

.09 

.12 

(A) difficult to learn 






.00 

.00 

.00 

(B) inferior in quality 






.01 

.01 

.01 

(C) providing great support 





- .05 

.11 

- .16 

(D) having little 

usefulness 





.00 

.00 

.00 

(E) feeling great 

regret 






- .00 

.01 

.01 

Omitted 





■'"Data for Asian-Americans who took the 1984 form is not available. 



ERRATUM (RR-88-32) 


Differential Item Functioning for 
Minority Examinees on the SAT 

Alicia P. Schmitt 
Neil J. Dorans 


- .32 - 

Table 4 


Items with High Standardized Difference (DSTD 2 > 





dstd 2 



Item Characteristic 

Item 








Type 

Form 

AA 1 

HA 

PR 

BLK 

Int. Homog. 

Vert. R. 

Antonym 

1983 

- .01 

- .01 

01 

- .01 

FACILITATE: 0 0 

(A) intensify 

0 



- .02 

- .03 

.04 

-.01 

(B) mobilize 




- .01 

- .03 

.06 

- .00 

(£) decline 




.04 

.13 

.19 

.03 

«Dy complicate 




.00 

- .00 

.01 

.00 

(E) meditate 




- .00 

- .06 

.06 

- .01 

Omitted 


Antonym 

1984 


-.01 

.01 

.00 

AGGRANDIZEMENT: 0 0 

(A) assessment 

0 




.01 

.00 

.01 

(B) leniency 





.01 

.01 

.02 

(C) restitution 





.00 

.02 

- .00 

(D) annulment 





.06 

.11 

.02 

fEj) diminution 





-.07 

.08 

-.05 

Omitted 


Analogy 

1983 





BARK:TREE:: 0 Stem 

Key 


- .05 

- .08 

.14 

- .10 

^a) skin:fruit E 

(B) dev;: grass 

C 



- .00 

.00 

.01 

.01 




.02 

.04 

.08 

.06 

(C) seed:flower 




.02 

.02 

.02 

.03 

(D) peak:hill 




.01 

.02 

.02 

.01 

(E) vake:bc/at 




.00 

.00 

.01 

.00 

Omitted 


Analogy 

1983 

.01 

.03 

.01 

.01 

CUMULUS:CLOUD:: 0 0 

(A) lake:ocean 

0 



.00 

.03 

.01 

.01 

(B) carnivore:meat 




.01 

.02 

.03 

.02 

(£) glacier:blizzard 
fin evergreen:tree 




- .06 

- .11 

.10 

-.08 




.01 

.01 

.02 

.02 

(E) evening:daylight 




.03 

.02 

.01 

.02 

Omitted 


Antonym 

1984 


.01 

.02 

.02 

ENUNCIATE: 0 . 0 

(A) detach 

0 




-.08 

.11 

-.10 

slur 





.01 

.02 

.02 

(C) disfigure 





.01 

.03 

.02 

(D) cloister 





.04 

.04 

.02 

(E) delude 





.01 

.01 

.01 

Omitted 


Antonym 

1984 


.04 

.09 

.12 

PRACTICAL; 0 0 

(A) difficult to learn 

0 




.00 

.00 

.00 

(B) inferior in quality 





.01 

.01 

.01 

(C) providing great support 





-.05 

.11 

-.16 

fDj having little usefulness 





.00 

.00 

.00 

(E) feeling great regret 





-.00 

.01 

.01 

Omitted 



^Data for Asian-Americans who took the 1984 form is not available. 









ITEM NUMBER ITEM NUMBER 







FIGURE -4a - FORM 3H / VI ANALOGY ITEMS FIGURE 4b - FORM 3H / V2 ANALOGY ITEMS 

STANDARDIZED DIFFERENTIAL ITEM FUNCTION VALUES STANDARDIZED DIFFERENTIAL ITEM FUNCTION VALUES 


36 



Q COl-Q 









37 


00 
Ld 
3 
00 _i 

2 : < 

Ld > 


►H X 

o 

> H-f 
O h- Lu 
O O H4 
_J Z Q 

< 3 

ZLd 

< H- 

X h-l 

CM Ld X 
t-» X 

\ 2: 

-1 < 
X < o 

CO H H 

1 - cr 
X 2 : 
cr Ld O 
OH h 

Lu Ld cr 


rO 

LO 


Lu 3 
>-t Q_ 
3 

cr 

Q O 

Ld Lu 


X 

Ld 

cr a 



3 cr 

CO 

CO 


CM 

0 

CM 

0 < 

O 

0 

0 

O 

0 

O 

H-» O 







Lu X 
< 

6 

6 

6 

6 

6 

O 

1 


04 

90 

08 

0 

CM 


CO 

6 

6 

6 

6 

6 

6 

6 


1 i 1 1 1 


a 00 h* a 


00 

Ld 
3 
GO _J 
X < 

Ld > 
h- 

»-H 2: 

O 

> HH 

Ohii. 

O 0 h-» 

_J X 3 

< 3 

X Lu Ld 

< J— 
X HH 
Ld X 

>k5 


x x 

_i < 

X < o 

CO H H 

)— cr 
x x 
cr Ld O 
Ocrh 
u. Ld cr 

Lu Ld 
Lu 3 
I CL 

a 

id cn 

ifiO o 

Ld Lu 


IX 


Ld »—» 


cr a 



3 cr 

ao 

CO 


0 < 

O 

0 

0 

hh a 




lu X 
< 

d 

d 

d 


oj o c\j ^ co 

00000 


OOOOO 
l 1 l 


08 

0 

CM 


CO 

T-* 

d 

o’ 

d 

d 

d 


l 


cr 

Ld 

CD 

X 

3 

X 

X 

Ld 
h- 
h-♦ 


CO 


a co h q 


ITEM NUMBER 








FIGURE 6a - FORM 3H / VI ANALOGY ITEMS FIGURE 6b - FORM 3H / V2 ANALOGY ITEMS 

STANDARDIZED DIFFERENTIAL ITEM FUNCTION VALUES STANDARDIZED DIFFERENTIAL ITEM FUNCTION VALUES 

FOR BLACK/WHITE DIF FOR BLACK/WHITE DIF 



co^cvjOosJ^cocoOcvj^co 
OOOOOOOO- — — - 


3000000000000 

I I I I I I I I 

Q C/7 H Q 










ITEM NUMBER ITEM NUMBER 












FIGURE 8a - Empirical Option Curves and Standardized Response Rate 
Differences for an SAT-Verbal Antonym Item 


R 

o 

O o 




B 


o 

o 



o 



SCALED SCORE 


C 


o 

o 



00 

I 


cn 

H 


LU 

O 

00 

LU 


LU 

u 

z 

LU 

CO 

LU 


O 


O 



SCALED SCORE 


+PUERTG RICAN, ENGBL-MAIN 


Item #50 (Form 3H) 







PERCENT OMITTING PERCENT RESPONDING PERCENT RESPONDING 


FIGURE 8b - Empirical Option Curves and Standardized Response Rate 
Differences for an SAT-Verbal Antonym Item 


■H 4 tx> 

++44- * ® 

♦ + 

+ 44+ + + 

; + +^ + + 

+ ++ J* 


200 300 400 500 600 700 800 

SCRLED SCORE 



200 300 400 500 60 

SCRLED SCORE 




200 300 400 500 600 700 800 

SCRLED SCORE 


200 300 400 500 60 

SCRLED SCORE 


OMIT 


STZ XDIF - -0.0646 






200 300 400 500 600 700 800 

SCRLED SCORE 


2 r - * + r + + + ; 

I J + 4* + + + + + + 



200 300 400 500 60 

SCRLED SCORE 


+PUERTQ RICRN, ENGBL-MRIN 
©WHITE. ENGBL 


Item #50 (Form 3H) 





PERCENT NOT REACHING 


Figure 8c - Empirical Option Curves and Standardized Response Rate 
Differences for an SAT-Verbal Antonym Item 


NOT RERCHED 


CD 

O 



a 



CO 


oc 


! T j i i 

tx_ 


STZ XDIF = 0 . 0003 


20 

J 


cn 



f— 



z 

LU 

o 


CD 


— 

az 


*■*- 

LU 



a_ 

CD- 


z 




O 

; ' 

LU 



U 

1 _ 


z 



LU 



OC 

o 


LU 

CN 


lu 

1 _ 


U_ 



*—i 

CD 

-30 

1_ 

i -1-1-1 _ 1 _ 1 _ 


200 300 400 500 600 700 800 

SCRLED SCORE 


SHffL" 


ENGBL-MflIN 


Item #50 (Form 3H) 




PERCENT RESPONDING PERCENT RESPONDING PERCENT RESPONDING 


43 


FIGURE 9a - Empirical Option Curves and Standardized Response Rate 
Differences for an SAT-Verbal Analogy Item 


n* 



200 300 400 500 600 700 800 200 300 400 500 600 700 800 

SCRLED SCORE SCRLED SCORE 


+BLRCK, ENGBL 
©WHITE, ENGBL 


Item #61 (Form 3H) 








PERCENT OMITTING PERCENT RESPONDING PERCENT RESPONDING 


44 


FIGURE 9b - Empirical Option Curves and Standardized Response Rate 
Differences for an SAT-Verbal Analogy Item 


D 



200 300 400 500 600 700 800 200 300 400 500 600 700 800 

SCALED SCORE SCALED SCORE 


E 



200 300 400 500 600 700 800 200 300 400 500 600 700 800 

SCALED SCORE SCALED SCORE 


OMIT 



200 300 400 500 600 700 800 200 300 400 500 600 700 800 

SCRLED SCORE SCALED SCORE 


+BLACK, EKIGBL 
©WHITE, ENGBL 


Item #61 (Form 3H) 







100 


FIGURE 9c - Empirical Option Curves and Standardized Response Rate 
Differences for an SAT-Verbal Analogy Item 


NOT RERCHED 




+BLRCK, ENGBL 
OWHITE, ENGBL 


Item #61 (Form 3H) 





PERCENT RESPONDING PERCENT RESPONDING PERCENT RESPONDING 



Item #48 (Form 4H) 











PERCENT OMITTING PERCENT RESPONDING PERCENT RESPONDING 










PERCENT NOT REACHING 


FIGURE 10c - Empirical Option Curves and Standardized Response Rate 
Differences for an SAT-Verbal Antonym Item 


NOT 


o 

o 



REACHED 

O 



SCRLED SCORE 


FBLflCK. ENGBL 
©WHITE, ENGBL 


Item #48 (Form 4H) 




